A Comparison of Pooled and Sampled Relevance Judgments in the TREC 2006 Terabyte Track

نویسنده

  • Ian Soboroff
چکیده

Pooling is the most common technique used to build modern test collections. Evidence is mounting that pooling may not yield reusable test collections for very large document sets. This paper describes the approach taken in the TREC 2006 Terabyte Track: an initial shallow pool was judged to gather relevance information, which was then used to draw a random sample of further documents to judge. The sample judgments rank systems somewhat differently than the pool. Some analysis and plans for further research are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TREC 2006 Legal Track Overview

This paper describes the first year of a new TREC track focused on “e-discovery” of business records and other materials. A large collection of scanned documents produced by multiple real world discovery requests was adopted as the basis for the test collection. Topic statements were developed using a process representative of current practice in e-discovery applications, with both Boolean and ...

متن کامل

The Hedge Algorithm for Metasearch at TREC 2006

Aslam, Pavlu, and Savell [3] introduced the Hedge algorithm for metasearch which effectively combines the ranked lists of documents returned by multiple retrieval systems in response to a given query and learns which documents are likely to be relevant from a sequence of on-line relevance judgments. It has been demonstrated that the Hedge algorithm is an effective technique for metasearch, ofte...

متن کامل

A Practical Sampling Strategy for Efficient Retrieval Evaluation

We consider the problem of large-scale retrieval evaluation, with a focus on the considerable effort required to judge tens of thousands of documents using traditional test collection construction methodologies. Recently, two methods based on random sampling were proposed to help alleviate this burden: While the first method proposed by Aslam et al. is very accurate and efficient, it is also ve...

متن کامل

IO-Top-k at TREC 2006: Terabyte Track

This paper describes the setup and results of our contribution to the TREC 2006 Terabyte Track. Our implementation was based on the algorithms proposed in [1] “IOTop-k: Index-Access Optimized Top-K Query Processing, VLDB’06”, with a main focus on the efficiency track.

متن کامل

The University of Amsterdam at the TREC 2006 Terabyte Track

As part of the TREC 2006 Terabyte track, we conducted a range of experiments investigating the effects of larger test collections for both adhoc and known-item topics. In this paper, we document our official submissions to the TREC 2006 Terabyte track and conduct a number of more extensive experiments. First, we look at the amount of smoothing required for largescale collections. Second, we inv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007